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Expression quantitative trait loci (eQTL) analysis is a powerful approach 
toward identifying genetic loci associated with quantitative changes in gene 
expression. We applied genome-wide association analysis to a data set of 
> 300 000 single-nucleotide polymorphisms and >48 000 mRNA expression 
phenotypes obtained by lllumina microarray profiling of 149 human surgical 
liver samples obtained from Caucasian donors with detailed medical 
documentation. Of 1226 significant associations, only 200 were validated 
when comparing with a previously published similar study. Potential 
explanations for low replication rate include differences in microarray 
platforms, statistical modeling, covariates considered and origin and 
collection procedures of tissues. Focused analysis revealed a subset of 95 
associations related to absorption, distribution, metabolism and excretion of 
drugs. Of these, 21 were true replications and 74 were newly discovered 
associations in enzymes, transporters, transcriptional regulators and other 
genes. This study extends our knowledge about the genetics of inter- 
individual variability of gene expression with particular emphasis on 
pharmacogenomics. 
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Genetic variants can affect qualitative and quantitative aspects at all levels of 
gene expression, including gene transcription, splicing, transcript stability, rate 
of translation, protein function and degradation, thereby contributing to inter- 
subject variability and heritable metabolic, pharmacogenetic and other pheno- 
types. Many variants, in particular, common single-nucleotide polymorphisms 
(SNPs), affect gene expression in a quantitative manner, and the combination of 
larger sets of low-impact variants is believed to explain non-Mendelian types 
of inheritance, including complex quantitative traits such as body size. 13 Typical 
pharmacological phenotypes, such as drug response and toxicity, are highly 
likely to depend on multiple genes. In contrast to monogenically inherited 
pharmacogenetic polymorphisms, most of which have been discovered by 
following up on unusual clinical drug response phenotypes, 4 the basis for more 
complex phenotypes remained largely unknown. 5,6 
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A relatively new approach to identify unknown functional 
genetic variants that modulate gene expression, also termed 
'genetical genomics,' is the mapping of expression quanti- 
tative trait loci (eQTLs) using genome-wide association 
(GWA) methods in cohorts of unrelated individuals. 2,7 In 
this strategy, individual transcript levels are determined in a 
selected tissue or cell type using microarrays. In genomic 
DNA of the same individuals, in the order of 10 s to 10 6 SNPs 
are genotyped in parallel. By considering each individual 
gene transcript as a quantitative trait, association analysis 
identifies SNPs that are significantly associated with expres- 
sion. 8,9 Thus, the eQTL strategy differs from the typical 
GWA studies, as the majority of the >1000 published 
GWA studies typically focused on a single or only a few 
complex phenotypes. 10 

So far, only a limited number of genome-wide eQTL 
studies have been performed on various human tissues. 11-19 
In most cases, easily accessible peripheral tissues such as 
human HapMap lymphoblastoid cell lines, lymphocytes or 
monocytes were investigated. For example, in one of the 
earliest studies, Morley et al. 20 distinguished cis- and trans- 
effects, depending on the relative location of trait gene and 
SNP gene to each other. Several later studies found that 
traws-eQTLs were more difficult to reproduce. 12-15 Only few 
studies have appeared on internal tissues, including the 
brain, 21 adipose 18 and liver. 16 The latter study investigated a 
cohort of 427 human liver samples (in this paper referred to 
as the 'Seattle study') and found a multitude of new eQTLs. 
Furthermore, they showed that the eQTL approach together 
with network analyses can drive the identification of new 
susceptibility gene loci for complex disease traits such as 
type 1 diabetes. 16 

In this study, we investigated 149 human livers surgically 
removed from Caucasian donors to identify statistically 
significant associations between genetic polymorphisms 
and mRNA expression levels at a genome-wide scale. 
Although our study was similar in design and technology 
to the former study, 16 the set of human liver samples had no 
overlap and differed in many aspects including ethnicity, 
sampling procedures, availability and completeness of 
clinical data. We focus in this paper on genes involved in 
absorption, distribution, metabolism and excretion (ADME) 
of drugs to allow for more detailed analyses, which resulted 
in a smaller set (~20%) of truly replicated eQTLs and a 
larger set (~80%) of unique eQTLs. This demonstrated that 
the genetical genomics approach is useful to identify novel 
genotype-phenotype relationships, and that a single study 
is insufficient to uncover all existing eQTLs in a given tissue. 



Materials and methods 

Liver samples 

Liver tissues and corresponding blood samples were pre- 
viously collected from 150 patients of Caucasian ethnicity 
(71 males and 79 females) undergoing liver surgery at the 
Campus Virchow (University Medical Center Charite, 
Humboldt University, Berlin, Germany). The average age 



of the subjects was 58 ± 14 years. This study was approved 
by the ethics committees of the medical faculties of the 
Charite, Humboldt University, and of the University of 
Tuebingen and conducted in accordance with the Declara- 
tion of Helsinki. All tissue samples were examined by a 
pathologist and only histologically non-tumorous tissues 
were used. Clinical patient documentation available for all 
samples and shown to have significant influence on the 
analysis included age, sex, medical diagnosis (primary or 
secondary liver tumor, other diagnosis), presurgical medica- 
tion (regular drug treatment before surgery vs no drugs), 
cholestatic liver injury (based on liver function tests 22 ) and 
alcohol drinking and smoking habits. Patients with hepati- 
tis, cirrhosis or chronic alcohol use were excluded. Detailed 
information on sample metadata is given in Supplementary 
Table SI. 

Transcriptome analysis and genome-wide genotyping 
RNA isolation from liver tissues was performed using 
Trizol (Invitrogen, Paisley, UK) extraction and Qiagen 
RNeasy-mini kit (Qiagen, Valencia, CA, USA) with on- 
column DNase treatment as described previously. 23 Only 
high-quality RNA preparations according to Agilent Bioana- 
lyzer (Nano-Lab Chip Kit, Agilent Technologies, Waldbronn, 
Germany) RNA Integrity Number (RIN) assignment (>7) 
were used in this study. In all, 200 ng of total RNA was 
amplified and labeled using the Illumina TotalPrep RNA 
amplification kit (Ambion Applied Biosystems, Darmstadt, 
Germany). cRNA quality was assessed by capillary electro- 
phoresis on Agilent 2 100 Bioanalyzer (Agilent Technolo- 
gies). Expression levels of > 48 000 mRNA transcripts were 
assessed by Human- WG6v2 Expression BeadChip (Illumina, 
Eindhoven, The Netherlands). Hybridization was carried 
out according to the manufacturer's instructions. Genome- 
wide SNP data had been generated from genomic DNA using 
the HumanHap300 Genotyping BeadChip (Illumina) with 
318 237 SNPs as described before. 24 A comparison with the 
microarray platforms used in the Seattle study is shown in 
Figure 1. All data have been deposited in NCBI's Gene 
Expression Omnibus and are accessible through GEO 
Series accession number GSE32504 (http://www.ncbi.nlm. 
nih.gov/geo/query/acc. cgi?acc=GSE32504). 

Preprocessing and quality control 

Illumina BeadStudio version 3.0 (Illumina, San Diego, CA, 
USA) was used for all low-level preprocessing steps of 
the expression data, including background estimation and 
correction, normalization and probe set summary. After 
these low-level preprocessing steps, 9875 genes with high 
detection P- value (>0.1) or >10% missing values were 
filtered out and removed from the data set. 25 The remaining 
missing signal intensities were estimated using the 'k nearest 
neighbor' algorithm implemented in R BioConductor. 26,27 
The resulting data set was subsequently log2 transformed. 
Finally, after all preprocessing steps, the raw data of 48 701 
probe signal intensities were mapped and reduced to signal 
intensities corresponding to 15 439 unique genes. 
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Seattle 




Stuttgart 

Figure 1 Distribution of SNPs among different genotyping platforms used 
in the Seattle and Stuttgart study. SNP, single-nucleotide polymorphism. 

Raw data preprocessing of HumanHap300 Genotyping 
BeadChip was also performed using BeadStudio version 3.0. 
Next, missing genotypes were estimated using the MACH 
imputation algorithm, which is based on a hidden Markov 
model. 28 Subsequently, 15 235 SNPs with an extremely low 
call rate (<95%), 29 3466 SNPs with low minor allele 
frequencies (<4%) 16 and 201 SNPs not in the Hardy- 
Weinberg equilibrium (false discovery rate ^0.2), were 
excluded from further analyses. 16,30 Genetic similarity be- 
tween samples, referred to as population substructure, may 
lead to false-positive association results. 31 To identify possibly 
related individuals, we calculated pairwise identity-by-state 
distances. Consequently, one sample was excluded because of 
>95% genotype identity to another sample. 32 To detect 
further putative population substructures, the method of 
Price et al. 33 based on principal components analysis (PCA) 
was applied. This analysis revealed no evidence for population 
substructure within our cohort of liver samples. This 
comprehensive quality control analysis was performed using 
the R BioConductor package 'GenABEL'. 34 The finally pro- 
cessed data set was from 149 livers (71 males and 78 females, 
Supplementary Table SI) and consisted of 299 352 SNPs and 
15 439 gene expression levels. As a further internal control, we 
performed genotyping of sex chromosome-specific amelo- 
genin gene variants 35 and analysis of sex-specific gene 
expression (for example, XIST, RPS4Y, SMCY), which revealed 
100% agreement with patient documentation. 

CWA analysis 

GenABEL 34 was used to test all 4.6 billion possible 
combinations of SNPs and expression traits (299 352 
SNPs*15 439 traits) for significant associations. Here, we 
assumed a genetic model in which both alleles contribute to 
gene expression in an additive manner, because this has 
been shown to be one of the most powerful statistical 
approaches. 36 The SNP-trait associations were adjusted for 
sex, age, smoking, alcohol consumption, diagnosis, 
C-reactive protein level, cholestatic liver disease and pre- 



surgical medication as covariates (Supplementary Table SI), 
using an additive linear model. 

Bonferroni's multiple testing correction 

All traits were tested for associations to cis- or trans-acting 
SNPs individually. Study-wise individual cutoff levels were 
calculated to test for cis- and frans-associations. To derive a 
study-wise ris-cutoff level at significance level a = 0.05, we 
first determined all ris-acting SNPs within the data set. To 
allow for detailed comparisons of our results with those 
of Schadt et al., 16 we used the same definitions, that is, 
ris-acting SNPs are those that occur within 1 Mb upstream 
or downstream of the known or predicted 5'- and 3'-ends 
of any trait gene. Thus, the study-wise Bonferroni-adjusted 
P-value threshold equals 0.05/2 607 664 = 1.92 x 10~ 8 , where 
2 607 664 is the total number of possible ris-associations 
based on the above definition. Accordingly, the study-wise 
frans-specific Bonferroni's cutoff level was computed as the 
total number of tests for frans-associations, that is, 0.05/4.6 
billion = 1.08 x 10"". 

Further analysis of SNPs, haplotypes and genetic linkage 
Information on SNPs was obtained from the dbSNP data- 
base 37 (http://preview.ncbi.nlm.nih.gov/snp). Polymorph- 
isms occurring in an eQTL were tested for pairwise linkage 
to SNPs in the neighborhood by tagging specific blocks of 
genetic linkage using Haploview 38 (V4.2, HapMap V3R2) or 
SNAP 39 (http://www.broadinstitute.org/mpg/snap; using 
HapMap release 22 and 1000 genomes Pilotl) for CEU 
samples each. Pairwise r 2 was used as measure for linkage 
disequilibrium (LD). The genomic regions of trait genes were 
screened for previously described copy number variations 
using the genome variation database from the Center of 
Applied Genomics (http://projects.tcag.ca/variation/). 

Results 

C WA analysis and extraction of ADME genes 
We used additive linear models in GenABEL 34 to detect 
associations between SNPs and expression traits, and we 
considered a comprehensive set of available clinical data as 
covariates (Supplementary Table SI). In total, we identified 
1226 significant associations (1179 ris and 47 trans) between 
1163 SNPs and 371 different expression traits (Figure 2, 
Supplementary Table S2). 

As we were particularly interested in genes relevant to 
drug disposition and toxicity, we filtered identified eQTLs 
according to their potential relevance in the context of 
ADME processes. We compiled an ADME gene set compris- 
ing 682 genes (Supplementary Table S3) from various 
resources including the PharmaADME Working Group list 
of ADME genes (http://pharmaadme.org/), the NURSA 
(Nuclear Receptor Signaling Atlas) Consortium 40 and the 
PharmGKB knowledge base 41 (http://www.pharmgkb.org/). 
By requesting either SNP or trait gene or both to be an ADME 
gene, we identified a total of 95 significant associations, of 
which 89 were ris- or presumably ris-acting and 6 were trans- 
ox presumably frans-acting. Table 1 shows 21 associations 
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30 of the 200 replicated associations were exact validations, 
that is, the same SNPs were found in both studies to be 
associated with the same trait, whereas 170 SNPs were 
validated by LD, that is, they were found to be tight linkage 
(r 2 >0.8) to SNPs of the Seattle study. The remainder of the 
1062 SNPs represented independent associations only found 
in our study (Supplementary Table S2). Remarkably, none of 
the rrans-associations in either study was replicated by the 
other (Figure 2). Reanalysis of the data after omitting most of 
the covariates except those considered in the former study 
revealed 1313 associations, whereas still only ~20% of all 
associations were found in both studies (data not shown). 

Investigation of ADME eQTLs 

Table 1 summarizes 21 of the 95 ADME associations that 
represent replications, including 3 exactly replicated eQTLs 
and 18 matches by LD, together mapping to 11 different 
trait genes. 

For example, two SNPs in a haplotype block at the CYP3A 
locus on chromosome 7 were strongly correlated with 
increased expression of CYP3A5 in heterozygous carriers (see 
box plot in Figure 3). One of them (rsl0242455) represents 
an exact replication and the other (rsl859690) a replication by 
LD (Table 1). Both of them were completely linked to each 
other and to rs776746, the causative SNP of the well- 
investigated CYP3A5*3 splice variant. 42,43 Box plots showing 
the genotype-phenotype correlation for one representative 
SNP of each trait are compiled in Figure 3. Further analyses 
regarding these replicated results are presented in Supplemen- 
tary Results and Discussion and in Supplementary Table S4. 

Table 2 shows the 74 ADME associations (68 ris-acting 
and 6 trans-acting) uniquely identified in our study, which 
mapped to 31 different trait genes. Closer analysis showed 

Table 1 Significant eQTL associations filtered for ADME genes and validated by comparison with the Seattle study 

Trait Trait SNP (Stuttgart) SNP SNP gene Association ADME SNP Association LD (r 2 ) 

gene chromosome chromosome P-value assignment (Seattle) P-value 

(Stuttgart) (Seattle) 



CYP3A5 


7 


rsl 0242455 


7 


CYP3A5 


3.36e- 


10 


T,S 


rsl 0242455 


3.35e 


-22 


Exact 


CSTM3 


1 


rs1110l992 


1 


CSTM5 


9.73e- 


14 


T,S 


rsl 1 1 01 992 


3.35e 


-28 


Exact 


SQSTM1 


5 


rs565280 


5 


SQSTM1 


4.11e- 


09 


T,S 


rs565280 


2.3e- 


20 


Exact 


ABCC1 1 


16 


rsl 1861 379, rs8056100 


16 


ABCC1 1 


<4.14e- 


15 


T,S 


rsl 69461 22 


1.31e 


-12 


0.925 


ARNT 


1 


rsl 0888390, rs4970986, 


1 


CTSS, SETDB1 


<6.48e- 


11 


T 


rsl 0888395 


3.88e 


-11 


0.8 






rs4451553 




















CYP3A5 


7 


rsl 859690 


7 


ZNF498 


7.02e- 


10 


T 


rsl 0242455 


3.35e 


-22 


1 


DHRS2 


14 


rsl 866226 


14 


DHRS2 


1 .94e- 


11 


T,S 


rsl 885592 


1.39e 


-33 


1 


GSTT1 


22 


rsl 007888 


22 


MIT 


3.97e- 


15 


T 


rs4822458 


2.69e 


-39 


0.966 


MCMT 


10 


rs4751104 


10 


MGMT 


1.58e- 


10 


T,S 


rs2008387 


3.33e 


-12 


0.904 


SLC22A3 


6 


rs884742 


6 


SLC22A3 


1.56e- 


09 


T,S 


rs5 18295 


2.59e 


-19 


0.839 


SQSTM1 


5 


rsl 0277, rsl 650893, 


5 


LOC51149, 


<1.06e- 


08 


T 


rs565280 


2.3e- 


20 


0.936 






rsl 0651 54 




SQSTM1 
















VKORC1 


16 


rsl 0871 454, rs889548, 


16 


STX4A, MYST1, 


<5.24e- 


20 


T 


rs4889606 


1.66e 


-23 


0.931 






rs 749 76 7 




BCKDK 
















FM02 


1 


rsl 795240, rsl 736565 


1 


TM03, TM06 


<1.32e- 


08 


T,S 


rsl 795244 


9.09e 


-17 


0.846 



Abbreviations: ADME, absorption, distribution, metabolism and excretion; eQTL, expression quantitative trait loci; LD, linkage disequilibrium; SNP, single-nucleotide 
polymorphism. 

If multiple SNPs are associated with the same trait, only the highest P-value of the respective set of SNPs is given; ADME assignment indicates if T = trait-gene or 
S = SNP-gene belongs to the ADME list; LD, between SNPs of the Stuttgart and Seattle studies. 



that could be validated by comparison with the Seattle 
study (see below), and Table 2 shows the other 74 ADME 
associations uniquely identified in our study. 

Comparison and validation of eQTLs 

To investigate reproducibility of our eQTL results, we 
performed a detailed comparison with the Seattle study, 16 
which differed in several important aspects including 
technology platform, origin and number of samples ana- 
lyzed and available medical information. Surprisingly, only 
200 of all 1262 associations (16%) were also found in the 
Seattle study when Bonferroni's adjustment was applied to 
both studies (Figure 2). A more detailed analysis showed that 



Seattle 

(n=427) 



Stuttgart 

(n=149) 





cis 1229 (1 182) 
trans 136(135) 


/ cis 200(108) 
' trans 0 (0) 


cis 979 (328) \ 
trans 47(15) ' 


GWA 












\ cis 44 (38) 
trans 4 (4) 


\ cis 21 (11) 
\ trans 0 (0) 


cis 68 (30) J 
trans 6(1) / 


ADME 



Figure 2 Venn diagram of eQTL results in the Stuttgart and Seattle 
studies. Comparison of significant genetic associations detected in the 
two studies after Bonferroni's correction. Numbers in the upper box 
refer to genome-wide associations (GWAs), whereas numbers in the 
lower box refer to ADME genes. The number of associated expression 
traits is given in brackets. ADME, absorption, distribution, metabolism 
and excretion; eQTL, expression quantitative trait loci. 
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Table 2 Significant eQTL associations filtered for ADME genes and exclusively found in this study 



Trait 


Trait 


SNP 


SNP 


SNP gene 


Association 


ADME 


gene 


chromosome 




chromosome 




P-value 


assignment 


ABP1 


7 


rs4725367, rs6977381, rs6956432 


7 


HCA1 12, ABP1 


<2.4e-09 


T,S 


ARHGAP10 


4 


rs6535579 


4 


NR3C2 


1.5e-10 


S 


ARNT 


1 


rs 74 12746 


1 


ARNT 


2.19e-15 


T,S 


CAV1 


7 


rs1 049337 


7 


CAV1 


4.13e-12 


T 


CHURC1 


14 


rsl 1623705, rs2296327 


14 


GPX2 


<6.25e-10 


T 


CYP4F12 


19 


rs4381710, rs4312419 


19 


OR10H2 


<9.77e-09 


S 


DHRS2 


14 


rs7141385 


14 


DHRS2 


1.41e-13 


T,S 


EIF5A 


1 7 


rs2292064 


1 7 


GPS2 


2.97e-14 


T 


ENOSF1 


18 


rs2847153 


18 


TYMS 


1.43e-11 


S 


FM04 


1 


rsl 963273, rs714839, rs7515001 


1 


FM04 


<3.48e-09 


T,S 


GOLGB1 


3 


rs988401 8 


3 


EAF2 


4.01 e-24 


S 


GPX7 


1 


rs6588431, rs835342 


1 


GPX7 


<4.82e-17 


T,S 


CSTM3 


1 


rs2274536, rsl 887546, rsl 0735234 


1 


EPS8L3, GSTM3 


<8.25e-09 


T 


GST02 


10 


rsl 57080, rsl 56699, rsl 56697, rs4925, 


10 


GST02, GST01, 


<1.32e-08 


T,S 






rs 12769490 




G10ORF80 






GSTT1 


22 


rs6003959, rs4820571, rs738809, 


22 


MIF, CABIN! , 


<1.05e-09 


T 






rs738806, rs4822442, rs875643 




SLC2A11 






HLA-DRB4 


6 


rs389884 


6 


STK19 


2.41 e-09 


S 


HS. 563390 


6 


rs4715326, rs9474334, rs6917325 


6 


GSTA1, GSTA5 


<4.85e-09 


S 


EXOC3 


5 


rs12188164 


5 


AHRR 


7.32e-10 


s 


MGMT 


10 


rsl 2247354, rs531572, rs4751099, 


10 


MGMT 


<8.55e-09 


T,S 






rs4750759 










NUDT8 


1 1 


rs6591256, rs1695 


1 1 


GSTP1 


<5.9e-10 


s 


PSMB9 


6 


rs2071540 


6 


TAP! 


3.35e 1 1 


s 


SLC22A10 


1 1 


rs1201559, rs575009, rs41 21 881 , rs494608, 


1 1 


SLC22A9, SLC22A2S, 


<3.08e-11 


T,S 






rs566456, rsl 1231 409, rs7949840 




SLC22A24 






SQSTM1 


5 


rs248248 


5 


LOC5T149 


6.15e-09 


s 


TRPC4AP 


20 


rs2273684, rs6088590, rs61 20708 


20 


GSS, NCOA6 


<1 .1e 10 


s 


TTC19 


17 


rs2285580, rs21 57991, rsl 78810 


17 


NCOR1 


<1.24e-08 


s 


UGT1A1 


2 


rs2070959 


2 


UGT1A6 


6.63e-10 


T,S 


UNQ9391 


8 


rsl 247558, rsl 406891, rs783145, 


6 


PEG 


<4.09e-14 


s 






rs9355841, rsl 3231, rs4252125 










UROC1 


3 


rs777498, rs777499, rs812368 


3 


ZXDC 


<8.64e-10 


T 


USMG5 


10 


rs2486758 


10 


CYP17A1 


3.39e-12 


S 


VKORG1 


16 


rs4889490, rs7294, rsl 2445568 


16 


RNF40, VKORC1, 


<1.65e-09 


T,S 










STX1B2 






XRGC5 


2 


rs828704 


2 


XRCC5 


3.8e-09 


T,S 



Abbreviations: ADME, absorption, distribution, metabolism and excretion; eQTL, expression quantitative trait loci; SNP, single-nucleotide polymorphism. 

If multiple SNPs are associated with the same trait, only the highest P-value of the respective set of SNPs is given; ADME assignment indicates if T = trait-gene or 

S = SNP-gene belongs to the ADME list. 



that the corresponding SNPs had indeed been analyzed but 
were not significantly associated with expression in the 
former study. 16 These unique associations included addi- 
tional SNPs for traits mentioned in Table 1 but with LD of 
r 2 <0.8 (for example, ARNT, several GSTs, VKORC1) and 
additional ADME and ADME-related genes including the 
histamine and diamine-oxidizing copper enzyme ABP1, the 
arachidonic acid-metabolizing CYP4F12, the flavin mono- 
oxygenase FM04, thymodylate synthase (ENOSF1/TYMS) 
involved in 5-fluorouracil response and the solute carrier 
SLC22A10, among others. Furthermore, UGT1A1 expression 
was associated with rs2070959 (located in UGT1A6), which 
is closely linked (^ = 0.87) to rs8175347, the causal SNP of 
the UGT1A1*28 allele. 44 Remarkably, this well-known poly- 
morphism had not been detected by the Seattle study. 16 Box 
plots of these unique eQTLs represented by one SNP each are 
shown in Supplementary Figure SI. 



The only true frarcs-association significantly identified 
among ADME genes comprised six SNPs in the plasminogen 
(PLG) gene on chromosome 6, a zymogen of the serine 
protease plasmin, which were associated to expression of a 
distant serine protease UNQ9391(PRSS55) on chromosome 8. 
As depicted in Figure 4, three of the variants (rsl406891, 
rs783145 and rsl247558) are simultaneously locally asso- 
ciated with expression of PLG itself, thus substantiating 
functional impact. Whereas rsl247558 and rsl406891 are 
located in intergenic regions, rs783145 is located in an intron, 
suggesting that this or a closely linked variant might be the 
causative SNP. 

Discussion 

Several studies 1119 have shown the utility of eQTL analysis 
to elucidate relationships between genetic polymorphisms 
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SNP: rs1 1861 379 Trait:ABCC11 



SNP: rs1 0888390 Trait: ARNT 



SNP: rs10242455 Trait: CYP3A5 




AA (n=53) AC (n=0) AG (n=66) GG (n=24) CC (n=0) 
SNP: rs1007888 Trait: GSTT1 



AA(n=31) AC(n=0) AG (n=78) GG (n=40) CC (n=0) 
SNP: rs4751104 Trait: MGMT 



AA (n=75) AC (n=56) AG (n=0) GG (n=0) 
SNP: rs884742 Trait: SLC22A3 




AA (n-49) AC (n=0) AG (n-73) GG (n=27) CC (n=0) 
SNP: rs10277 Trait: SQSTM1 



AA(n=17) AC(n=0) AG (n=73) GG (n=57) CC (n=0) 

SNP: rs10871454 Trait: VKORC1 



AA(n=37) AC(n=59) AG (n=0) GG (n=0) CC (n=46) 




AA(n=26) AC(n=0) AG (n-77) GG (n-46) CC (n=0) 



AA(n=16) AC(n=0) AG (n-77) GG (n=55) CC (n=0) 



Figure 3 Box plots of validated ADME associations. Box and whisker diagrams include smallest gene expression values, lower quartiles, medians, 
upper quartiles, largest gene expression values and outliers of the 1 1 validated ADME associations. The size of each genotyping group is given in 
brackets along the x axis. ADME, absorption, distribution, metabolism and excretion. 
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Figure 4 Manhattan plot of novel trans-eQTL and related box-plots of ds/frans-associations. (a) Negative log-transformed P-values of all SNPs 
tested for association to the serine protease UNQ9391 (PRSS55), which is located on chromosome 8. A set of 6 tram-acting SNPs at the plasminogen 
(PLC) locus on chromosome 6 was identified to be significantly associated. The dashed line represents the P-value cutoff level for tra/is-associations. 
The three indicated SNPs are simultaneously locally associated with the expression of PLG itself, (b) Box plots showing cis (left) and trans (right) 
genotype-phenotype relationships for one selected SNP. eQTL, expression quantitative trait loci; SNP, single-nucleotide polymorphism. 



and gene expression on a genome-wide scale, thus con- 
tributing to understanding the genetic basis of inter-indivi- 
dual variability and heritability of complex traits. However, 
limited information is so far available regarding complete- 
ness, reproducibility and interpretation of such data. 

Our study was performed in the human liver with a focus 
on ADME genes, which have not been given special 
attention in previous studies. It replicates a former study 
by Schadt et al. 16 (the Seattle study) in terms of general 
approach and tissue analyzed, and we therefore strived to 
compare the results of the two studies in detail. 

The fraction of replicated observations between this and 
the former study was between 16 and 20%, which included 
replications by LD. As these represent eQTLs replicated in 
two different studies, they should be highly reliable, 
justifying in-depth analysis without further validation. 
However, the large discrepancy between the two studies 
was unexpected. Likely reasons include technical differences 
between the two studies, in particular the different gene 
expression profiling platforms applied in our (Illumina) 
and the former study (Agilent Technologies), which used 
different probes for most genes. This can lead to different 



expression data, given the common occurrence of variant 
transcripts, SNP-interference with probe hybridization and 
inconsistent gene annotation. 45,46 Additional technical 
differences concern the genotyping arrays and differences 
in data preprocessing. 

Moreover, it should be noted that we applied an additive 
genetic model, which implies a gene-dose effect, but also 
reveals recessive or dominant associations at lower signi- 
ficance. Importantly, these three types of associations 
represent the biologically most meaningful possibilities. In 
contrast, the eQTL association analysis by Schadt et al. 16 
was based on Kruskal-Wallis tests, that is, a codominant 
model, which also reveals extreme deviations from additi- 
vity, that is, overdominant associations in which the 
two homozygous groups show similar expressions but the 
heterozygotes differ significantly. This (unknown) fraction 
of the eQTL associations reported by Schadt et al. 16 must 
therefore be expected to be not reproducible by our study. 

Additional important differences between the two studies 
relate to the number, origin and sampling of tissue, as well 
as medical information available. Sample-related differences 
should be of uttermost importance, although this is quite 
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difficult to prove as no direct comparison was possible. The 
Stuttgart cohort consisted exclusively of liver tissue removed 
surgically from Caucasian donors in one hospital, using 
only one procedure for sample collection, freezing, storage, 
RNA isolation, quality assessment, DNA isolation and 
microarray analysis. Although all but one sample were 
resected because of liver cancer, this fact by itself should 
not affect genotype-phenotype relationships because only 
non-tumorous material was analyzed. In contrast the 427 
samples of the Seattle cohort consisted mostly of postmor- 
tem material obtained from prospective organ donors who 
were presumably cancer-free, but the tissue quality may 
vary more widely because of warm ischemia before pre- 
servation and long storage times before cryopreservation. 
Furthermore, the Seattle cohort was collected in three 
independent centers, giving rise to differences among 
samples regarding tissue acquisition and storage protocols, 
criteria for RNA quality, etc. Another significant influential 
factor in the Seattle study was ethnicity, 16 in contrast to 
our study in which no influence was detected using popula- 
tion stratification methods. Finally, the availability of medical 
information was also different in the two studies. Whereas the 
former study used imputation methods to complete large 
parts of missing information, medical documentation for the 
Stuttgart liver samples were almost complete and comprised a 
larger number of parameters. 

Taken together, the many potentially influential para- 
meters that differed between the two studies may well 
explain why only a small fraction of eQTL results were 
reproducible. Given the difficulty in validating ris-associa- 
tions, it is probably not surprising that none of the trans- 
associations could be reproduced, as these are based on 
additional indirect downstream effects and are therefore 
generally less well replicated than ris-associations. 16,47 

Pharmacologically relevant eQTLs were extracted based on a 
comprehensive compilation of 682 non-overlapping ADME 
genes, resulting in 89 cis- and 6 trans-associations. Similarly 
to the entire eQTL set, only 3 of the 95 ADME eQTLs were 
exactly replicated, and another 10 eQTLs were shown to be 
replications by LD. More detailed information on all replicated 
associations is presented in Supplementary Results and 
Discussion. Several of the unique ADME gene associations 
of our study concern known polymorphically expressed genes, 
including UGT1A1, 44 CYP4F12 48 and GST subforms M3, 02 
and Tl . Although these were not identified in the Seattle study, 
they likely relate to these biologically confirmed polymorph- 
isms. Except for CYP3A5 and CYP4F12, no other CYP genes 
were present among the identified eQTLs. The failure to detect 
the well-known polymorphic CYPs (for example, 2A6, 2B6, 
2C19, 2D6) may be explained by the mechanisms involved 
which often include complex splicing events which may not 
be detectable by the microarray probes. 49 

During revision of this manuscript, another paper 
appeared, 50 which similar to our paper compared the results 
of an eQTL association analysis in human livers (two collec- 
tives with ft! = 206 and n 2 = 60) with the Seattle study. 16 
Using different validation sets, they found replication rates 
between 49 and 58% for ris-eQTLs, that is, about half of 



all ris-associations failed to be validated in their study. 
A thorough investigation of the factors influencing repro- 
ducibility revealed similar reasons as those mentioned in our 
study. The lower replication rate in our study compared 
with 50 may be explained by more relaxed validation criteria 
in their study, in particular regarding muss less stringent 
multiple correction conditions. Other likely reasons for 
differing replication rates include the factors discussed 
above, that is, origin and sampling conditions of the tissues, 
statistical modeling, as well as ethnicity and other covariates 
of the liver donors. 

In conclusion, we carried out a GWA analysis of > 300 000 
SNPs and 48000 expression phenotypes determined in a 
cohort of 149 human surgical liver samples obtained from 
Caucasian donors to identify genetic loci associated with 
quantitative changes in gene expression. A subset of 95 
significant genotype-phenotype relationships, which was 
classified as ADME or ADME-related may be of particular 
relevance for drug disposition and toxicity. Detailed com- 
parison between this study and the similar Seattle study 
demonstrated that quantitative trait loci are difficult to 
reproduce because of a number of technical and statistical 
reasons, and that several studies are required to discover the 
full extent of genetic determination in quantitative traits. 
Follow-up studies to elucidate the causal variants and their 
biological and pharmacological relevance should therefore 
concentrate on the validated results. 
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